{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import gym"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Naive implementation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2018-01-25 23:44:39,711] Making new env: CartPole-v0\n",
      "[2018-01-25 23:44:39,991] You are calling 'step()' even though this environment has already returned done = True. You should always call 'reset()' once you receive 'done = True' -- any further steps are undefined behavior.\n"
     ]
    }
   ],
   "source": [
    "env = gym.make('CartPole-v0')\n",
    "env.reset()\n",
    "\n",
    "for _ in range(1000):\n",
    "    env.render()\n",
    "    env.step(env.action_space.sample())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understanding `env.step`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As mentioned in the documentation page, each environment is separated into different episodes, with `done=True` indicating that the specific episode has ended. Thus, we need to call reset there. For this, we need to understand what `env.step(action)` does and returns. `env.step(action)` takes the next step in the environment by performing the action specified by `action` and returns a tuple:\n",
    "- observation: This is environment specific and represents our observation of the environment after taking the action specified in `env.step(action)`.\n",
    "- reward: The reward we received upon performing the action.\n",
    "- done: This is the parameter we discussed about. We need to monitor this and call `env.reset()` when `done=True`.\n",
    "- info: Additional information for debugging"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2018-01-25 23:45:47,428] Making new env: CartPole-v0\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[-0.02610098  0.02090901  0.01410614  0.03953635]\n",
      "[-0.0256828   0.21582587  0.01489687 -0.24866279]\n",
      "[-0.02136628  0.02049437  0.00992361  0.04868146]\n",
      "[-0.0209564  -0.17476846  0.01089724  0.34447878]\n",
      "[-0.02445176 -0.37004372  0.01778682  0.64057801]\n",
      "[-0.03185264 -0.56540907  0.03059838  0.93880872]\n",
      "[-0.04316082 -0.37071269  0.04937455  0.65589538]\n",
      "[-0.05057507 -0.17631163  0.06249246  0.37915938]\n",
      "[-0.05410131  0.01786976  0.07007565  0.10681669]\n",
      "[-0.05374391  0.21192118  0.07221198 -0.16296067]\n",
      "[-0.04950549  0.40593907  0.06895277 -0.43201717]\n",
      "[-0.04138671  0.209912    0.06031243 -0.11841924]\n",
      "[-0.03718847  0.40412028  0.05794404 -0.39148087]\n",
      "[-0.02910606  0.20822591  0.05011442 -0.08110646]\n",
      "[-0.02494154  0.40259499  0.04849229 -0.35756656]\n",
      "[-0.01688964  0.5969952   0.04134096 -0.63457295]\n",
      "[-0.00494974  0.79151686  0.0286495  -0.91395536]\n",
      "[ 0.0108806   0.98623982  0.0103704  -1.19749813]\n",
      "[ 0.0306054   1.18122603 -0.01357957 -1.48691287]\n",
      "[ 0.05422992  0.98627214 -0.04331782 -1.19850128]\n",
      "[ 0.07395536  0.79173666 -0.06728785 -0.91970325]\n",
      "[ 0.08979009  0.9877004  -0.08568191 -1.23275137]\n",
      "[ 0.1095441   1.18381318 -0.11033694 -1.55100107]\n",
      "[ 0.13322036  0.99017418 -0.14135696 -1.29468211]\n",
      "[ 0.15302385  0.79710261 -0.16725061 -1.04938406]\n",
      "[ 0.1689659   0.99400059 -0.18823829 -1.38955256]\n",
      "Episode #0 finished after 25 timesteps\n",
      "[-0.00013978  0.01719513  0.02057901  0.01339735]\n",
      "[ 2.04118871e-04  2.12016003e-01  2.08469612e-02 -2.72722271e-01]\n",
      "[ 0.00444444  0.40683438  0.01539252 -0.55875778]\n",
      "[ 0.01258113  0.21149979  0.00421736 -0.26126538]\n",
      "[ 0.01681112  0.01631789 -0.00100795  0.03274476]\n",
      "[ 0.01713748 -0.17878959 -0.00035305  0.3251095 ]\n",
      "[ 0.01356169 -0.37390651  0.00614914  0.61768107]\n",
      "[ 0.00608356 -0.56911382  0.01850276  0.91229433]\n",
      "[-0.00529872 -0.76448115  0.03674865  1.21073467]\n",
      "[-0.02058834 -0.56985244  0.06096334  0.92979037]\n",
      "[-0.03198539 -0.375604    0.07955915  0.65687111]\n",
      "[-0.03949747 -0.57173803  0.09269657  0.97350728]\n",
      "[-0.05093223 -0.76797317  0.11216671  1.29381004]\n",
      "[-0.06629169 -0.9643275   0.13804292  1.61939858]\n",
      "[-0.08557824 -1.16077989  0.17043089  1.95172979]\n",
      "Episode #1 finished after 14 timesteps\n",
      "[-6.01014551e-05  1.59677824e-03  9.37055543e-03  5.60017578e-03]\n",
      "[-2.81658903e-05  1.96583097e-01  9.48255895e-03 -2.84111559e-01]\n",
      "[0.0039035  0.00132719 0.00380033 0.01154696]\n",
      "[ 0.00393004 -0.19384905  0.00403127  0.30542651]\n",
      "[5.30588111e-05 1.21521824e-03 1.01397972e-02 1.40176687e-02]\n",
      "[ 7.73631760e-05 -1.94050672e-01  1.04201506e-02  3.09882496e-01]\n",
      "[-0.00380365  0.00092128  0.0166178   0.02050392]\n",
      "[-0.00378522  0.19580102  0.01702788 -0.26688992]\n",
      "[0.0001308  0.00044024 0.01169008 0.03111472]\n",
      "[ 1.39600402e-04 -1.94847388e-01  1.23123748e-02  3.27462948e-01]\n",
      "[-3.75734736e-03  9.71294599e-05  1.88616338e-02  3.86880827e-02]\n",
      "[-0.0037554   0.1949436   0.0196354  -0.24798464]\n",
      "[ 0.00014347 -0.00045319  0.0146757   0.05082648]\n",
      "[ 1.34403339e-04  1.94455284e-01  1.56922322e-02 -2.37190249e-01]\n",
      "[ 0.00402351 -0.0008873   0.01094843  0.06040088]\n",
      "[ 0.00400576 -0.1961645   0.01215644  0.3565179 ]\n",
      "[ 8.24729729e-05 -1.21747610e-03  1.92868028e-02  6.76928732e-02]\n",
      "[ 5.81234509e-05  1.93622726e-01  2.06406602e-02 -2.18843106e-01]\n",
      "[ 0.00393058 -0.0017881   0.0162638   0.08027862]\n",
      "[ 0.00389482 -0.19713938  0.01786937  0.37804813]\n",
      "[-4.79716146e-05 -2.27570489e-03  2.54303332e-02  9.10526006e-02]\n",
      "[-9.34857124e-05 -1.97752754e-01  2.72513852e-02  3.91649079e-01]\n",
      "[-0.00404854 -0.39325064  0.03508437  0.69279794]\n",
      "[-0.01191355 -0.5888413   0.04894033  0.99631608]\n",
      "[-0.02369038 -0.39440678  0.06886665  0.71939621]\n",
      "[-0.03157852 -0.20030187  0.08325457  0.44915978]\n",
      "[-0.03558455 -0.00645012  0.09223777  0.1838379 ]\n",
      "[-0.03571355 -0.2027625   0.09591452  0.50413419]\n",
      "[-0.0397688  -0.00911389  0.10599721  0.24315202]\n",
      "[-0.03995108  0.18434692  0.11086025 -0.01430565]\n",
      "[-0.03626414 -0.01217591  0.11057414  0.31119519]\n",
      "[-0.03650766 -0.20868525  0.11679804  0.63660164]\n",
      "[-0.04068137 -0.01536918  0.12953007  0.38286376]\n",
      "[-0.04098875  0.17769861  0.13718735  0.13366361]\n",
      "[-0.03743478  0.37061606  0.13986062 -0.11278708]\n",
      "[-0.03002246  0.56348573  0.13760488 -0.35828155]\n",
      "[-0.01875274  0.36670325  0.13043925 -0.02556971]\n",
      "[-0.01141868  0.55973678  0.12992785 -0.27441975]\n",
      "[-2.23942654e-04  7.52788675e-01  1.24439458e-01 -5.23465406e-01]\n",
      "[ 0.01483183  0.55615528  0.11397015 -0.19430519]\n",
      "[0.02595494 0.35960314 0.11008405 0.13204504]\n",
      "[ 0.033147    0.55299012  0.11272495 -0.12398017]\n",
      "[ 0.0442068   0.74633178  0.11024534 -0.37908103]\n",
      "[ 0.05913344  0.93972956  0.10266372 -0.63507004]\n",
      "[ 0.07792803  1.13328106  0.08996232 -0.8937385 ]\n",
      "[ 0.10059365  1.32707533  0.07208755 -1.15684112]\n",
      "[ 0.12713516  1.13109155  0.04895073 -0.84245357]\n",
      "[ 0.14975699  0.93533687  0.03210166 -0.53478751]\n",
      "[ 0.16846372  1.12999304  0.02140591 -0.81718505]\n",
      "[ 0.19106359  0.93458468  0.00506221 -0.51784682]\n",
      "[ 0.20975528  1.12963499 -0.00529473 -0.80893025]\n",
      "[ 0.23234798  0.93458599 -0.02147333 -0.5179175 ]\n",
      "[ 0.2510397   1.1300036  -0.03183168 -0.817289  ]\n",
      "[ 0.27363977  1.32554652 -0.04817746 -1.11981173]\n",
      "[ 0.3001507   1.13108852 -0.0705737  -0.84262187]\n",
      "[ 0.32277247  1.32709908 -0.08742614 -1.15663728]\n",
      "[ 0.34931445  1.52324509 -0.11055888 -1.47540316]\n",
      "[ 0.37977936  1.71953057 -0.14006695 -1.8004719 ]\n",
      "[ 0.41416997  1.91591351 -0.17607638 -2.13320661]\n",
      "Episode #2 finished after 58 timesteps\n",
      "[0.0007869  0.0167783  0.01319091 0.03597034]\n",
      "[ 0.00112246  0.21170863  0.01391031 -0.25252171]\n",
      "[0.00535664 0.01639084 0.00885988 0.04451613]\n",
      "[ 0.00568445 -0.17885703  0.0097502   0.33998121]\n",
      "[0.00210731 0.01612484 0.01654983 0.05038882]\n",
      "[ 0.00242981  0.21100562  0.0175576  -0.23702689]\n",
      "[0.00664992 0.01563729 0.01281706 0.06114211]\n",
      "[ 0.00696267 -0.17966606  0.01403991  0.35784119]\n",
      "[ 0.00336935 -0.37498477  0.02119673  0.65491797]\n",
      "[-0.00413035 -0.57039533  0.03429509  0.95419933]\n",
      "[-0.01553825 -0.37575112  0.05337908  0.67248548]\n",
      "[-0.02305328 -0.18141019  0.06682879  0.39707529]\n",
      "[-0.02668148 -0.37741349  0.07477029  0.71005699]\n",
      "[-0.03422975 -0.57348696  0.08897143  1.02530745]\n",
      "[-0.04569949 -0.7696737   0.10947758  1.34454669]\n",
      "[-0.06109296 -0.96598903  0.13636851  1.66937891]\n",
      "[-0.08041274 -0.77269042  0.16975609  1.42209102]\n",
      "[-0.09586655 -0.96945617  0.19819791  1.76266611]\n",
      "Episode #3 finished after 17 timesteps\n",
      "[-0.01539359 -0.01233658 -0.02751178 -0.00364429]\n",
      "[-0.01564033 -0.20705339 -0.02758466  0.28023295]\n",
      "[-0.01978139 -0.01154903 -0.02198    -0.02102089]\n",
      "[-0.02001237  0.18388112 -0.02240042 -0.32055691]\n",
      "[-0.01633475  0.3793148  -0.02881156 -0.62021906]\n",
      "[-0.00874846  0.18460684 -0.04121594 -0.33674775]\n",
      "[-0.00505632  0.38029035 -0.0479509  -0.64213779]\n",
      "[ 0.00254949  0.57604674 -0.06079365 -0.94952703]\n",
      "[ 0.01407042  0.38179354 -0.07978419 -0.67654784]\n",
      "[ 0.02170629  0.57792811 -0.09331515 -0.9932455 ]\n",
      "[ 0.03326486  0.38417009 -0.11318006 -0.7312678 ]\n",
      "[ 0.04094826  0.58065908 -0.12780542 -1.05731951]\n",
      "[ 0.05256144  0.77722139 -0.14895181 -1.38723081]\n",
      "[ 0.06810587  0.58423653 -0.17669642 -1.14458894]\n",
      "[ 0.0797906   0.39180663 -0.1995882  -0.91211721]\n",
      "Episode #4 finished after 14 timesteps\n",
      "[-0.00489721  0.0129913   0.04196769 -0.03592547]\n",
      "[-0.00463739  0.20748708  0.04124918 -0.31507737]\n",
      "[-0.00048765  0.01180257  0.03494764 -0.00967647]\n",
      "[-2.51593818e-04  2.06406347e-01  3.47541071e-02 -2.91131382e-01]\n",
      "[0.00387653 0.01080653 0.02893148 0.01230683]\n",
      "[ 0.00409266 -0.18471814  0.02917762  0.31397576]\n",
      "[ 3.98300861e-04 -3.80243319e-01  3.54571312e-02  6.15715695e-01]\n",
      "[-0.00720657 -0.18563423  0.04777145  0.33440776]\n",
      "[-0.01091925  0.00877642  0.0544596   0.05716393]\n",
      "[-0.01074372  0.20307693  0.05560288 -0.21785154]\n",
      "[-0.00668218  0.39736177  0.05124585 -0.49248958]\n",
      "[ 0.00126505  0.59172487  0.04139606 -0.76859182]\n",
      "[ 0.01309955  0.39605829  0.02602422 -0.46317642]\n",
      "[ 0.02102072  0.59080298  0.01676069 -0.74754429]\n",
      "[ 0.03283677  0.39545385  0.00180981 -0.44963434]\n",
      "[ 0.04074585  0.20030635 -0.00718288 -0.15638148]\n",
      "[ 0.04475198  0.3955304  -0.01031051 -0.45132176]\n",
      "[ 0.05266259  0.20055578 -0.01933695 -0.1619066 ]\n",
      "[ 0.0566737   0.39594914 -0.02257508 -0.46062656]\n",
      "[ 0.06459269  0.20115343 -0.03178761 -0.17514395]\n",
      "[ 0.06861575  0.00650049 -0.03529049  0.10734398]\n",
      "[ 0.06874576 -0.18809844 -0.03314361  0.38868736]\n",
      "[ 0.06498379  0.00747789 -0.02536986  0.08574154]\n",
      "[ 0.06513335  0.20295414 -0.02365503 -0.21483641]\n",
      "[ 0.06919244  0.39840615 -0.02795176 -0.51488633]\n",
      "[ 0.07716056  0.59391036 -0.03824948 -0.81624484]\n",
      "[ 0.08903877  0.39933239 -0.05457438 -0.53583397]\n",
      "[ 0.09702541  0.59517755 -0.06529106 -0.84520125]\n",
      "[ 0.10892896  0.7911267  -0.08219509 -1.15768065]\n",
      "[ 0.1247515   0.59716656 -0.1053487  -0.89186084]\n",
      "[ 0.13669483  0.40361931 -0.12318592 -0.63406477]\n",
      "[ 0.14476722  0.60022471 -0.13586721 -0.96286325]\n",
      "[ 0.15677171  0.40716406 -0.15512448 -0.71576326]\n",
      "[ 0.16491499  0.60405408 -0.16943974 -1.0529759 ]\n",
      "[ 0.17699607  0.41153406 -0.19049926 -0.81790864]\n",
      "[ 0.18522675  0.60868166 -0.20685743 -1.16394903]\n",
      "Episode #5 finished after 35 timesteps\n",
      "[-0.04771483 -0.04288668  0.03429598  0.04314462]\n",
      "[-0.04857257  0.15172713  0.03515887 -0.23852337]\n",
      "[-0.04553802 -0.043879    0.03038841  0.06503907]\n",
      "[-0.0464156   0.15079438  0.03168919 -0.21790336]\n",
      "[-0.04339972  0.34544933  0.02733112 -0.50042414]\n",
      "[-0.03649073  0.54017555  0.01732264 -0.78437003]\n",
      "[-0.02568722  0.73505523  0.00163524 -1.0715531 ]\n",
      "[-0.01098611  0.93015553 -0.01979582 -1.36372239]\n",
      "[ 0.007617    1.12551979 -0.04707027 -1.66253096]\n",
      "[ 0.03012739  0.93097653 -0.08032089 -1.38487283]\n",
      "[ 0.04874692  1.12700299 -0.10801835 -1.7015535 ]\n",
      "[ 0.07128698  0.93327839 -0.14204942 -1.4443559 ]\n",
      "[ 0.08995255  1.12983384 -0.17093654 -1.77784173]\n",
      "[ 0.11254923  0.93699964 -0.20649337 -1.54281696]\n",
      "Episode #6 finished after 13 timesteps\n",
      "[-0.01190403  0.00938134 -0.03520526  0.04898776]\n",
      "[-0.0117164  -0.18521858 -0.0342255   0.33035855]\n",
      "[-0.01542078  0.01037343 -0.02761833  0.02708202]\n",
      "[-0.01521331 -0.18434179 -0.02707669  0.31092471]\n",
      "[-0.01890014  0.01115525 -0.0208582   0.00982706]\n",
      "[-0.01867704 -0.18366145 -0.02066166  0.29585668]\n",
      "[-0.02235027 -0.37848284 -0.01474452  0.58195231]\n",
      "[-0.02991992 -0.57339514 -0.00310548  0.86995424]\n",
      "[-0.04138783 -0.37823108  0.01429361  0.57629655]\n",
      "[-0.04895245 -0.57355045  0.02581954  0.87344785]\n",
      "[-0.06042346 -0.3787889   0.0432885   0.58899301]\n",
      "[-0.06799923 -0.184299    0.05506836  0.31025429]\n",
      "[-0.07168521  0.00999689  0.06127344  0.03543417]\n",
      "[-0.07148528  0.20418908  0.06198213 -0.23730435]\n",
      "[-0.0674015   0.0082389   0.05723604  0.07426757]\n",
      "[-0.06723672  0.20249559  0.05872139 -0.1998221 ]\n",
      "[-0.06318681  0.00658508  0.05472495  0.11079132]\n",
      "[-0.0630551   0.20088189  0.05694077 -0.16413647]\n",
      "[-0.05903747  0.004993    0.05365804  0.14595219]\n",
      "[-0.05893761  0.19930711  0.05657709 -0.12933158]\n",
      "[-0.05495146  0.39357487  0.05399046 -0.40364216]\n",
      "[-0.04707997  0.19773043  0.04591761 -0.09443808]\n",
      "[-0.04312536  0.3921652   0.04402885 -0.37228779]\n",
      "[-0.03528205  0.58663491  0.0365831  -0.65076937]\n",
      "[-0.02354936  0.78122874  0.02356771 -0.93171178]\n",
      "[-0.00792478  0.97602487  0.00493347 -1.21689656]\n",
      "[ 0.01159572  1.17108284 -0.01940446 -1.50802953]\n",
      "[ 0.03501737  1.36643458 -0.04956505 -1.80670652]\n",
      "[ 0.06234606  1.56207329 -0.08569918 -2.11437035]\n",
      "[ 0.09358753  1.75793996 -0.12798659 -2.43225686]\n",
      "[ 0.12874633  1.56412791 -0.17663172 -2.18144201]\n",
      "Episode #7 finished after 30 timesteps\n",
      "[-0.01380046 -0.00100016  0.009519    0.03887965]\n",
      "[-0.01382047 -0.19625731  0.01029659  0.33455065]\n",
      "[-0.01774561 -0.00128341  0.0169876   0.04513244]\n",
      "[-0.01777128 -0.19664478  0.01789025  0.34312633]\n",
      "[-0.02170418 -0.39201662  0.02475278  0.64139665]\n",
      "[-0.02954451 -0.58747472  0.03758071  0.94177056]\n",
      "[-0.041294   -0.39287881  0.05641612  0.66112893]\n",
      "[-0.04915158 -0.19858539  0.0696387   0.38673001]\n",
      "[-0.05312329 -0.39462323  0.0773733   0.70053128]\n",
      "[-0.06101575 -0.2006543   0.09138393  0.43317341]\n",
      "[-0.06502884 -0.00693698  0.1000474   0.17063939]\n",
      "[-0.06516758 -0.203338    0.10346018  0.49313263]\n",
      "[-0.06923434 -0.00981578  0.11332284  0.23476457]\n",
      "[-0.06943065 -0.20635909  0.11801813  0.5609342 ]\n",
      "[-0.07355783 -0.40292238  0.12923681  0.88834502]\n",
      "[-0.08161628 -0.20976882  0.14700371  0.63892082]\n",
      "[-0.08581166 -0.40660145  0.15978213  0.97404891]\n",
      "[-0.09394369 -0.21394178  0.17926311  0.73551688]\n",
      "[-0.09822252 -0.02168912  0.19397344  0.50418061]\n",
      "[-0.09865631  0.1702462   0.20405706  0.27834818]\n",
      "Episode #8 finished after 19 timesteps\n",
      "[-0.03976567  0.01645882 -0.01937947 -0.03352089]\n",
      "[-0.03943649 -0.17837994 -0.02004989  0.25298516]\n",
      "[-0.04300409  0.01702248 -0.01499018 -0.04595383]\n",
      "[-0.04266364  0.21235614 -0.01590926 -0.34332835]\n",
      "[-0.03841652  0.40770076 -0.02277583 -0.64098533]\n",
      "[-0.0302625   0.2129036  -0.03559553 -0.35556089]\n",
      "[-0.02600443  0.01830536 -0.04270675 -0.0743113 ]\n",
      "[-0.02563832  0.21401271 -0.04419298 -0.38015663]\n",
      "[-0.02135807  0.40973345 -0.05179611 -0.6864396 ]\n",
      "[-0.0131634   0.6055347  -0.0655249  -0.99496884]\n",
      "[-1.05270742e-03  8.01468979e-01 -8.54242786e-02 -1.30749009e+00]\n",
      "[ 0.01497667  0.99756334 -0.11157408 -1.62564326]\n",
      "[ 0.03492794  0.80391686 -0.14408695 -1.36971444]\n",
      "[ 0.05100628  1.00051734 -0.17148123 -1.70377435]\n",
      "[ 0.07101662  0.80773423 -0.20555672 -1.46900976]\n",
      "Episode #9 finished after 14 timesteps\n",
      "[-0.04966121  0.01112644 -0.03397096  0.04493339]\n",
      "[-0.04943868  0.20671863 -0.03307229 -0.25827117]\n",
      "[-0.04530431  0.01208406 -0.03823772  0.02379956]\n",
      "[-0.04506263  0.20773292 -0.03776173 -0.28069836]\n",
      "[-0.04090797  0.0131694  -0.04337569 -0.00016044]\n",
      "[-0.04064458 -0.18130451 -0.0433789   0.27852758]\n",
      "[-0.04427067  0.01440857 -0.03780835 -0.02751526]\n",
      "[-0.0439825  -0.18015136 -0.03835866  0.25300306]\n",
      "[-0.04758553  0.01549672 -0.03329859 -0.05152786]\n",
      "[-0.0472756   0.21107992 -0.03432915 -0.35452801]\n",
      "[-0.043054    0.40667275 -0.04141971 -0.65783518]\n",
      "[-0.03492054  0.60234601 -0.05457642 -0.96326725]\n",
      "[-0.02287362  0.7981571  -0.07384176 -1.27258406]\n",
      "[-0.00691048  0.99413959 -0.09929344 -1.58744646]\n",
      "[ 0.01297231  0.80032796 -0.13104237 -1.32730527]\n",
      "[ 0.02897887  0.60708074 -0.15758848 -1.07833603]\n",
      "[ 0.04112049  0.80389279 -0.1791552  -1.41603542]\n",
      "[ 0.05719834  0.61138426 -0.20747591 -1.18428471]\n",
      "Episode #10 finished after 17 timesteps\n",
      "[-0.03005425  0.00851172  0.03512236 -0.00364055]\n",
      "[-0.02988402  0.20311282  0.03504954 -0.28503833]\n",
      "[-0.02582176  0.00750897  0.02934878  0.0184897 ]\n",
      "[-0.02567158  0.202198    0.02971857 -0.26479078]\n",
      "[-0.02162762  0.39688345  0.02442276 -0.54795403]\n",
      "[-0.01368995  0.20142706  0.01346368 -0.24767731]\n",
      "[-0.00966141  0.39635417  0.00851013 -0.53608327]\n",
      "[-0.00173433  0.59135543 -0.00221154 -0.82607262]\n",
      "[ 0.01009278  0.78650756 -0.01873299 -1.11945029]\n",
      "[ 0.02582293  0.98187019 -0.04112199 -1.41794999]\n",
      "[ 0.04546034  0.78728074 -0.06948099 -1.13839904]\n",
      "[ 0.06120595  0.98323906 -0.09224898 -1.45203828]\n",
      "[ 0.08087073  0.7893635  -0.12128974 -1.18954419]\n",
      "[ 0.096658    0.98583043 -0.14508062 -1.51765137]\n",
      "[ 0.11637461  0.79273053 -0.17543365 -1.27354873]\n",
      "[ 0.13222922  0.60022615 -0.20090463 -1.04053764]\n",
      "Episode #11 finished after 15 timesteps\n",
      "[ 0.00505391 -0.02260334  0.01521549  0.02997197]\n",
      "[ 0.00460185 -0.21794014  0.01581493  0.32741645]\n",
      "[ 0.00024304 -0.02304688  0.02236325  0.0397625 ]\n",
      "[-2.17893116e-04  1.71747367e-01  2.31585040e-02 -2.45781559e-01]\n",
      "[ 0.00321705 -0.02369756  0.01824287  0.05411521]\n",
      "[ 0.0027431  -0.21907627  0.01932518  0.35249762]\n",
      "[-0.00163842 -0.41446763  0.02637513  0.65121117]\n",
      "[-0.00992777 -0.6099468   0.03939935  0.95208134]\n",
      "[-0.02212671 -0.41537656  0.05844098  0.67203289]\n",
      "[-0.03043424 -0.61126011  0.07188164  0.98252846]\n",
      "[-0.04265944 -0.41717111  0.09153221  0.71326179]\n",
      "[-0.05100287 -0.6134331   0.10579744  1.03329531]\n",
      "[-0.06327153 -0.80979082  0.12646335  1.3572315 ]\n",
      "[-0.07946734 -0.61646163  0.15360798  1.10663476]\n",
      "[-0.09179658 -0.81323219  0.17574067  1.44329863]\n",
      "[-0.10806122 -0.62065446  0.20460665  1.21028356]\n",
      "Episode #12 finished after 15 timesteps\n",
      "[ 0.01596705 -0.03441572 -0.00306384  0.01513224]\n",
      "[ 0.01527873 -0.2294936  -0.0027612   0.30684691]\n",
      "[ 0.01068886 -0.03433241  0.00337574  0.01329446]\n",
      "[ 0.01000221  0.16074097  0.00364163 -0.27832147]\n",
      "[ 0.01321703  0.35581078 -0.0019248  -0.56985361]\n",
      "[ 0.02033325  0.55095967 -0.01332187 -0.8631423 ]\n",
      "[ 0.03135244  0.74626044 -0.03058472 -1.15998397]\n",
      "[ 0.04627765  0.94176722 -0.0537844  -1.4620975 ]\n",
      "[ 0.06511299  1.13750552 -0.08302635 -1.77108538]\n",
      "[ 0.0878631   0.94341271 -0.11844805 -1.50533076]\n",
      "[ 0.10673136  1.13975567 -0.14855467 -1.83252396]\n",
      "[ 0.12952647  0.94655777 -0.18520515 -1.58943354]\n",
      "Episode #13 finished after 11 timesteps\n",
      "[-0.00133534  0.00333573  0.00025281  0.04758702]\n",
      "[-0.00126862  0.19845405  0.00120455 -0.24501613]\n",
      "[ 0.00270046  0.00331492 -0.00369577  0.0480465 ]\n",
      "[ 0.00276676 -0.19175385 -0.00273484  0.33956109]\n",
      "[-0.00106832  0.00340691  0.00405638  0.046017  ]\n",
      "[-0.00100018 -0.19177297  0.00497672  0.33997699]\n",
      "[-0.00483564  0.00327782  0.01177626  0.04886758]\n",
      "[-0.00477008 -0.192011    0.01275361  0.34524263]\n",
      "[-0.0086103  -0.38731203  0.01965846  0.64191982]\n",
      "[-0.01635654 -0.19246954  0.03249686  0.35549177]\n",
      "[-0.02020593 -0.3880381   0.0396067   0.65824213]\n",
      "[-0.0279667  -0.5836883   0.05277154  0.96312859]\n",
      "[-0.03964046 -0.3893136   0.07203411  0.68748005]\n",
      "[-0.04742673 -0.19526151  0.08578371  0.41831651]\n",
      "[-0.05133196 -0.39148768  0.09415004  0.73676291]\n",
      "[-0.05916172 -0.58777519  0.1088853   1.05752942]\n",
      "[-0.07091722 -0.39425115  0.13003589  0.80091155]\n",
      "[-0.07880224 -0.59089397  0.14605412  1.13150837]\n",
      "[-0.09062012 -0.39795422  0.16868429  0.88796748]\n",
      "[-0.09857921 -0.20547364  0.18644364  0.65270289]\n",
      "[-0.10268868 -0.40263519  0.19949769  0.99781731]\n",
      "Episode #14 finished after 20 timesteps\n",
      "[-0.04568981 -0.04968115  0.02945408 -0.03379859]\n",
      "[-0.04668343  0.14500631  0.02877811 -0.31704486]\n",
      "[-0.0437833  -0.05051346  0.02243721 -0.01542704]\n",
      "[-0.04479357  0.14427964  0.02212867 -0.30094717]\n",
      "[-0.04190798 -0.05115061  0.01610972 -0.00136827]\n",
      "[-0.04293099 -0.24649984  0.01608236  0.29635362]\n",
      "[-0.04786099 -0.44184733  0.02200943  0.59406496]\n",
      "[-0.05669794 -0.24704025  0.03389073  0.30839536]\n",
      "[-0.06163874 -0.44262829  0.04005864  0.61157093]\n",
      "[-0.07049131 -0.63828655  0.05229006  0.91659697]\n",
      "[-0.08325704 -0.83407508  0.070622    1.22524462]\n",
      "[-0.09993854 -1.03003175  0.09512689  1.53919254]\n",
      "[-0.12053917 -0.83617428  0.12591074  1.27764616]\n",
      "[-0.13726266 -0.64286211  0.15146366  1.02689342]\n",
      "[-0.1501199  -0.83964018  0.17200153  1.36304148]\n",
      "[-0.16691271 -0.64703922  0.19926236  1.1287225 ]\n",
      "Episode #15 finished after 15 timesteps\n",
      "[-0.03874708  0.02733863 -0.03451941 -0.03920232]\n",
      "[-0.03820031  0.22293814 -0.03530346 -0.3425735 ]\n",
      "[-0.03374155  0.02833577 -0.04215493 -0.06122889]\n",
      "[-0.03317483  0.22403599 -0.0433795  -0.3669083 ]\n",
      "[-0.02869411  0.41974666 -0.05071767 -0.67294771]\n",
      "[-0.02029918  0.61553551 -0.06417663 -0.98115796]\n",
      "[-0.00798847  0.81145613 -0.08379978 -1.29328888]\n",
      "[ 0.00824066  1.00753717 -0.10966556 -1.61098663]\n",
      "[ 0.0283914   0.81386803 -0.14188529 -1.35440514]\n",
      "[ 0.04466876  1.01045691 -0.1689734  -1.68789968]\n",
      "[ 0.0648779   1.207082   -0.20273139 -2.0280789 ]\n",
      "Episode #16 finished after 10 timesteps\n",
      "[-0.04674825  0.02373521 -0.0341643   0.03700735]\n",
      "[-0.04627355 -0.17088059 -0.03342416  0.31871834]\n",
      "[-0.04969116 -0.36551095 -0.02704979  0.60067596]\n",
      "[-0.05700138 -0.56024425 -0.01503627  0.88471739]\n",
      "[-0.06820627 -0.36492141  0.00265808  0.58734574]\n",
      "[-0.07550469 -0.16983678  0.01440499  0.29550131]\n",
      "[-0.07890143  0.02507688  0.02031502  0.00739608]\n",
      "[-0.07839989  0.21990168  0.02046294 -0.27880862]\n",
      "[-0.07400186  0.41472583  0.01488677 -0.56496798]\n",
      "[-0.06570734  0.21939821  0.00358741 -0.26763247]\n",
      "[-0.06131938  0.41446878 -0.00176524 -0.55918175]\n",
      "[-0.05303     0.21937165 -0.01294888 -0.26705549]\n",
      "[-0.04864257  0.02443688 -0.01828999  0.02151529]\n",
      "[-0.04815383 -0.17041807 -0.01785968  0.30837187]\n",
      "[-0.05156219  0.02495375 -0.01169224  0.01011041]\n",
      "[-0.05106312  0.22024141 -0.01149004 -0.2862385 ]\n",
      "[-0.04665829  0.0252852  -0.01721481  0.00279852]\n",
      "[-0.04615259 -0.16958569 -0.01715884  0.29000064]\n",
      "[-0.0495443  -0.36445882 -0.01135882  0.57722285]\n",
      "[-5.68334758e-02 -5.59419733e-01  1.85634671e-04  8.66305937e-01]\n",
      "[-0.06802187 -0.75454421  0.01751175  1.15904722]\n",
      "[-0.08311275 -0.94988992  0.0406927   1.45716906]\n",
      "[-0.10211055 -0.75529021  0.06983608  1.17747148]\n",
      "[-0.11721636 -0.95124626  0.09338551  1.4912042 ]\n",
      "[-0.13624128 -1.14737263  0.12320959  1.81152736]\n",
      "[-0.15918874 -1.34363339  0.15944014  2.13981884]\n",
      "[-0.1860614  -1.53993212  0.20223652  2.47720929]\n",
      "Episode #17 finished after 26 timesteps\n",
      "[ 0.0221508   0.00033926  0.04711769 -0.04154291]\n",
      "[ 0.02215758  0.194755    0.04628683 -0.31899538]\n",
      "[ 0.02605268 -0.00099457  0.03990692 -0.01208205]\n",
      "[ 0.02603279  0.19353302  0.03966528 -0.29191158]\n",
      "[ 0.02990345 -0.00213139  0.03382705  0.0130127 ]\n",
      "[ 0.02986083 -0.19772172  0.03408731  0.31617362]\n",
      "[ 0.02590639 -0.00310147  0.04041078  0.03443256]\n",
      "[ 0.02584436 -0.19877892  0.04109943  0.33958665]\n",
      "[ 0.02186878 -0.00426514  0.04789116  0.06014219]\n",
      "[ 0.02178348  0.1901386   0.04909401 -0.21705446]\n",
      "[ 0.02558625  0.3845256   0.04475292 -0.4938558 ]\n",
      "[ 0.03327676  0.57898876  0.0348758  -0.77210551]\n",
      "[ 0.04485654  0.77361391  0.01943369 -1.0536143 ]\n",
      "[ 0.06032882  0.96847288 -0.0016386  -1.34013443]\n",
      "[ 0.07969828  0.7733716  -0.02844128 -1.04796464]\n",
      "[ 0.09516571  0.96885918 -0.04940058 -1.34943804]\n",
      "[ 0.11454289  0.77439153 -0.07638934 -1.07261028]\n",
      "[ 0.13003072  0.58035788 -0.09784154 -0.80484522]\n",
      "[ 0.14163788  0.77667528 -0.11393845 -1.12663247]\n",
      "[ 0.15717138  0.97309061 -0.1364711  -1.45277061]\n",
      "[ 0.1766332   1.16959928 -0.16552651 -1.78479106]\n",
      "[ 0.20002518  0.97667927 -0.20122233 -1.54780924]\n",
      "Episode #18 finished after 21 timesteps\n",
      "[ 0.02919326  0.01170897 -0.00448211 -0.03041331]\n",
      "[ 0.02942743  0.20689491 -0.00509038 -0.32450702]\n",
      "[ 0.03356533  0.01184581 -0.01158052 -0.03343372]\n",
      "[ 0.03380225  0.20713189 -0.01224919 -0.3297478 ]\n",
      "[ 0.03794489  0.01218644 -0.01884415 -0.04095273]\n",
      "[ 0.03818862  0.20757347 -0.0196632  -0.33952109]\n",
      "[ 0.04234008  0.01273675 -0.02645362 -0.05310311]\n",
      "[ 0.04259482  0.20822782 -0.02751569 -0.35401359]\n",
      "[ 0.04675938  0.01350771 -0.03459596 -0.07013257]\n",
      "[ 0.04702953  0.20910813 -0.03599861 -0.37352683]\n",
      "[ 0.05121169  0.40472247 -0.04346915 -0.67733954]\n",
      "[ 0.05930614  0.21023057 -0.05701594 -0.39865317]\n",
      "[ 0.06351075  0.40611306 -0.064989   -0.70875307]\n",
      "[ 0.07163301  0.60207213 -0.07916406 -1.02116446]\n",
      "[ 0.08367446  0.40808908 -0.09958735 -0.75435111]\n",
      "[ 0.09183624  0.21447078 -0.11467437 -0.49459296]\n",
      "[ 0.09612565  0.41100736 -0.12456623 -0.821102  ]\n",
      "[ 0.1043458   0.21778996 -0.14098827 -0.57004942]\n",
      "[ 0.1087016   0.41457832 -0.15238926 -0.9036164 ]\n",
      "[ 0.11699317  0.6113993  -0.17046159 -1.24005573]\n",
      "[ 0.12922115  0.41882592 -0.1952627  -1.0052556 ]\n",
      "Episode #19 finished after 20 timesteps\n"
     ]
    }
   ],
   "source": [
    "env = gym.make('CartPole-v0')\n",
    "for i_episode in range(20):\n",
    "    observation = env.reset()\n",
    "    for t in range(1000):\n",
    "        env.render()\n",
    "        print(observation)\n",
    "        action = env.action_space.sample()\n",
    "        observation, reward, done, info = env.step(action)\n",
    "        \n",
    "        if done:\n",
    "            print('Episode #%d finished after %d timesteps' % (i_episode, t))\n",
    "            break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understanding agent actions\n",
    "The environments in gym have `Space` objects which describe the valid actions and observations\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Discrete(2)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "env.action_space"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Box(4,)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "env.observation_space"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Basically, `Discrete` specifies the range of non-negative values. <br>\n",
    "This means `Discrete(3)` means that the action can take values `{0, 1, 2}`. <br>\n",
    "`Box` represents an n-dimensional (here, n=4) value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
